Optimizing the NPB CG benchmark for multi-core AMD Opteron microprocessors
نویسنده
چکیده
CG approximates the largest eigenvalue of a sparse, symmetric, positive definite matrix, using inverse iteration [3]. The matrix is generated by summing outer products of sparse vectors, with a fixed number of nonzero elements in each generating vector. The matrix sizes and total number of nonzero elements (“computed nonzeros,” following [3]) are listed in Table 1. The benchmark computes a given number of eigenvalue estimates, referred to as “outer iterations,” using 25 iterations of the conjugate gradient method to solve the linear system in each outer iteration.
منابع مشابه
Observation and analysis of the multicore performance impact on scientific applications
With the proliferation of large multicore high-performance computing systems, application performance is often negatively affected. This paper provides benchmark results for a representative workload from the Department of Defense High-performance Computing Modernization Program. The tests were run on a Cray XT-3 and XT-4, which use dualand quad-core AMD Opteron microprocessors. We use a combin...
متن کاملBenchmarking CMSSW on Intel and AMD single-core, dual- core and quad-core systems
We have benchmarked dual-processor quad-core AMD Opteron 2350 and 2356, dual-processor quad-core Intel Xeon E5345, single processor quad-core Intel Xeon X5472, dual-processor dual-core AMD Opteron 2214, dual-processor single-core Intel Xeon EM64T and single-processor single-core Intel Xeon EM64T systems using a CMSSW event simulation and reconstruction application. The results are presented in ...
متن کاملUnderstanding and Mitigating Multicore Performance Issues on the AMD Opteron Architecture
Over the past 15 years, microprocessor performance has doubled approximately every 18 months through increased clock rates and processing efficiency. In the past few years, clock frequency growth has stalled, and microprocessor manufacturers such as AMD have moved towards doubling the number of cores every 18 months in order to maintain historical growth rates in chip performance. This document...
متن کاملA Scalability Study of Columbia using the NAS Parallel Benchmarks
The Columbia system at the NASA Advanced Supercomputing (NAS) facility is a cluster of 20 SGI Altix nodes, each with 512 Itanium 2 processors and 1 terabyte (TB) of shared-access memory. Four of the nodes are organized as a 2048-processor capability-computing platform connected by two low-latency interconnects— NUMALink4 (NL4) and InfiniBand (IB). To evaluate the scalability of Columbia with re...
متن کاملA Comparative Performance Evaluation of Multi Processor Multi Core Server Processor Architectures on Enterprise Middleware Performance
In this paper we describe the performance evaluation and comparison of server based “Enterprise Middleware” frameworks on multi-processor multi-core server processor architectures. We experimented a 'single processor quad core Intel Xeon' server processor and a 'dual processor dual core multiprocessor AMD Opteron'. Also we discuss the expected enterprise middleware framework execution performan...
متن کامل